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Abstract 

Next to the shortest path distance, the second most popular distance function between 
vertices in a graph is the commute distance (resistance distance). For two vertices u and 
V, the hitting time Huv is the expected time it takes a random walk to travel from it to v. 
The commute time is its symmetrized version Cuv ~ Huv + H^u- In our paper we study 
the behavior of hitting times and commute distances when the number n of vertices in the 
graph is very large. We prove that as n — >■ oo, under mild assumptions, hitting times and 
commute distances converge to expressions that do not take into account the global structure 
of the graph at all. Namely, the hitting time Huv converges to 1/dv and the commute time to 
l/rfii + where du and dv denote the degrees of vertices u and v. In these cases, the hitting 
and commute times are misleading in the sense that they do not provide information about the 
structure of the graph. We focus on two major classes of random graphs: random geometric 
graphs (kNN-graphs, e-graphs, Gaussian similarity graphs) and random graphs with given 
expected degrees (in particular, Erdos-Renyi graphs with and without planted partitions). 

1 Introduction 

Given an undirected, weighted graph G = {V, E) with n vertices, the commute distance between 
two vertices u and v is defined as the expected time it takes the natural random walk starting in 
vertex u to travel to vertex v and back to u. It is equivalent (up to a constant) to the resistance 
distance, which interprets the graph as an electrical network and defines the distance between ver- 
tices u and V as the efl^ective resistance between these vertices. See below for exact definitions and 
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The commute distance has many nice properties, both from a theoretical and a practical point of 
view. It is a Euclidean distance function and can be computed in closed form. As opposed to the 
shortest path distance, it takes into account all paths between u and v, not just the shortest one. 
As a rule of thumb, the more paths connect u with v, the smaller their commute distance becomes. 
Hence it supposedly satisfies the following, highly desirable property: 
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Property (if): Vertices in the same "cluster" of the graph have a small commute dis- 
tance, whereas vertices in different clusters of the graph have a large commute distance 
to each other. 

Consequently, the commute distance is considered a convenient tool to encode the cluster structure 
of the graph. 

In this paper we study how the commute distance behaves when the size of the graph increases. 
Our main result is that if the graph is large enough, then in many graphs the hitting times and 
commute distances can be approximated by an extremely simple formula with very high accuracy. 
Namely, denoting by Huv the expected hitting time and by Cuv the commute distance between 
two vertices u and v, by d„ the degree of vertex u, and by vol(G) the volume of the graph, we 
show that if the graph gets large enough, for all vertices u ^ v, 

, / „x Huv ~ -y- and Cuv ~ h — . 

vol(G) dv vol(G) du dy 

On the one hand, we prove these results for arbitrary fixed, large graphs (Proposition [5| . Here 
the quality of the approximation depends on geometric quantities describing the graph (such as 
minimal and maximal degrees, the spectral gap, and so on). The main part of the paper prove 
that results hold with probability tending to 1, as n — oo, in all major classes of random graphs: 
random geometric graphs (/c-nearest neighbor graphs, e-graphs, and Gaussian similarity graphs) 
and for random graphs with given expected degrees (in particular, also Erdos-Renyi graphs with 
and without planted partitions). As a rule of thumb, our approximation results hold whenever the 
minimal degree in the graph increases with n (for example, as log(n) in random geometric graphs 
or as log^(n) in random graphs with given expected degrees). 

In order to make our results as accessible as possible to a wide range of computer scientists, we 
present two different strategies to prove our results: one based on flow arguments on electrical 
networks and another based on spectral arguments. While the former approach leads to tighter 
bounds, the latter is more general. An important step on the way is that we prove bounds on the 
spectral gap in all classes of random geometric graphs. This is interesting by itself as the spectral 
gap governs many important properties and processes on graphs. In this generality, the bounds on 
the spectral gaps are new. 

Our results have important consequences. 

Hitting and commute times in large graphs are often misleading. On the negative side, 
our approximation result shows that contrary to popular belief, the commute distance does not 
take into account any global properties of the data, at least if the graph is "large enough" . It just 
considers the local density (the degree of the vertex) at the two vertices, nothing else. The resulting 
large sample commute distance dist{u,v) = + 1/dv is completely meaningless as a distance 
on a graph. For example, all data points have the same nearest neighbor (namely, the vertex with 
the largest degree), the same second-nearest neighbor (the vertex with the second-largest degree), 
and so on. In particular, one of the main motivations to use the commute distance. Property 
{'k), no longer holds when the graph becomes large enough. Even more disappointingly computer 
simulations show that n does not even need to be very large before {'k) breaks down. Often, n in 
the order of 1000 is already enough to make the commute distance very close to its approximation 
expression. This effect is even stronger if the dimensionality of the underlying data space is large. 
Consequently, even on moderate-sized graphs, the use of the raw commute distance should be 
discouraged. 

Efficient computation of approximate commute distances. In some applications the com- 
mute distance is not used as a distance function, but as a tool to encode the connectivity properties 
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of a graph, for example in graph sparsification ( Spielman and Srivastava 2008 ) or when computing 



bounds on mixing or cover times (Aleliunas et al. 1979 Chandra et al. 1989 Avin and Ercal 



Cooper and Frieze 2009 1 or graph labeling ( Herbster and Pontil 2006 Cesa-Bianchi et al. 



20091. To obtain the commute distance between all points in a graph one has to compute the 



pseudo-inverse of the graph Laplacian matrix, an operation of time complexity O(n^). This is 
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Figure 1: Electrical network intuition: The effective resistance between s andt is dominated by the 
edges adjacent to s and t. 



prohibitive in large graphs. To circumvent the matrix inversion, several approximations of the 



commute distance have been suggested in the literature (Spielman and Srivastava 2008 Sarkar 



and Moore 2007 Brand 2005). Our results lead to a much simpler and well-justified way of ap- 



proximating the commute distance on large random geometric graphs. 

We start our paper with Section [2] that tries to convey our main results and techniques on a very 
high level. Then, after introducing general definitions and notation (Section [3| , we present our 
main results in Section |4j This section is divided into two parts (flow based part and spectral 
part). All proofs are presented in Sections [s] and [6j A final discussion can be found in Section [t] 
For the convenience of the reader, some basic facts on random geometric graphs are presented in 



the appendix. Parts of this work is built on our conference paper von Luxburg et al. (2010). 



2 Intuition about our results and proofs 

Before diving into technicalities, we would like to present our results in an intuitive, non-technical 
way. Readers interested in crisp theorems are encouraged to skip this section right away. 

Informally the main result of our paper is the following: 

Main result: Consider a "large" graph that is "reasonably strongly' connected. In such a graph, 
the hitting times and commute distances between any two vertices u and v can be approximated by 
the simple expressions 

TTpn ~ ~T~ and t//~t\ ^uv ~ ~j I" ~j~ ■ 

vol(G) dv vol(G) du dy 

In this section we want to present some intuitive arguments to understand why this makes sense. 
In order to show a broad picture and to make our results accessible to a general audience, we are 
going to present two completely different approaches in our paper. 

2.1 Electrical network intuition 

Consider an unweighted graph as an electrical network where each edge has resistance 1. We want 
to compute the effective resistance between two fixed vertices s and t by exploiting the electrical 
laws. Resistances in series add up, that is for two resistances Ri and R2 in series we get the overall 
resistance R = Ri + R2- Resistances in parallel lines satisfy 1/R = l/i?i -I- l/i?2- Now consult 
the unweighted electrical network in Figure [T] Consider the vertex s and all edges from s to its dg 
neighbors. The resistance "spanned" by these dg parallel edges satisfies 1/R = ^ = dg, that 

is i? = l/ds- Similarly for t. Between the neighbors of s and the ones of t there are very many 
paths. It turns out that the contribution of these paths to the resistance is negligible (essentially, 
we have so many wires between the two neighborhoods that electricity can fiow nearly freely). So 
the overall effective resistance between s and t is dominated by the edges adjacent to i and j with 
contributions 1 /dg -\- 1 /dt . 

The main theorems derived from the electrical network approach are Theorems [3] and |4] In order 
to prove them, we bound the electrical resistance between two vertices using flow arguments. The 
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Figure 2: Random walk intuition: Between its start and target vertex (black crosses), the random 
walk wanders around so long that by the time it finally arrives at its target it has already "forgotten" 
where it started from. 



overall idea is that we construct a unit flow between s and t that uses as many paths as possible. 
From the technical side, this approach has the advantage that we can throw away irrelevant parts 
of the graph — we can concentrate on a "valid region" that contains s, t, and many paths between 
s and t. For this reason, we need less assumptions on the geometry of the underlying space "close 
to its boundary". We explicitly construct such flows for random geometric graphs. The idea is to 
place a grid on the underlying space and control the flow between different cells of the grid. 

As far as we can see, this technique can only be used to bound the resistance distance Rij^ it does 
not work for the individual hitting times Hij or Hji. 



2.2 Random walk intuition 

Another approach to understand our convergence results is based on random walks. Essentially, 
our results for the hitting times Huv say that regardless at which vertex u wc start, the time to 
hit vertex v just depends on the degree of v. What happens is that as the graph gets large, the 
random walk can explore so many paths that by the time it is close to v it "has forgotten" where 
it came from (cf. Figure [2|. This is why the hitting time does not depend on u. Once the random 
walk is in the vicinity of v, the question is just whether it exactly hits v or whether it passes close 
to V without hitting it. Intuitively, the likelihood to hit v is inversely proportional to the density 
of the graph close to v: if there are many edges in the neighborhood of v, then it is easier to hit v 
than if there are only few edges. This is how the inverse degree comes into play. 

Stated slightly differently, the random walk has already mixed before it hits v. For this reason, the 
hitting time does not depend on u. All that is left is some component depending on v. Notably, 
this component exactly coincides with the mean return time of v (the expected time it takes a 
random walk that starts at v to return to v), which is given as vol{G)/dy. 

In the light of our explanation it is reasonable to expect that the quality of our approximation 
depends on the mixing time of the random walk, and the latter is known to be governed by the 
size of the spectral gap, in particular the quantity 1 — A2 (see below for exact definitions). Indeed, 
we will see in our Key Proposition [5] that 1 — A2 is exactly the quantity that governs the deviation 
bound for the hitting and commute times. If 1 — A2 is small, then the graph is too well-clustered, 
has a large mixing time, and our approximation guarantee gets worse. 



The spectral approach leads to the main theorems in Section |4.2| We first have to express the 
commute time in terms of a spectral representation of the graph (Proposition [5]). To make use of 
this proposition we need a lower bound on the spectral gap 1 — A2 of the graph. 
To bound the spectral gap in random geometric graphs we use path-based arguments as well, 
namely we use the canonical path technique of Diaconis and Stroock (1991). Here one has to 
construct a set of "canonical paths" between each pair of vertices in the graph. The goal is to 
distribute these paths "as well as possible" over the graph. As in the case above we use a grid 
to control the paths between different cells of this grid. This is very reminiscent of the technique 
described above. However, an important difference is that we now need to consider paths between 
all pairs of points (we have to bound the spectral gap of the whole graph) instead of just paths 
between s and t. In the language of flows, instead of looking at a unit flow from s to t wc would 
have to use multi-commodity flows between all pairs of vertices instead of a single flow from s to i 
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(cf. [Sinclair 1992 Diaconis and SalofF-Coste 1993). For this reason, we need stronger assumptions 
on the geometry of the underlying space. 



For the case of random graphs with expected degrees, we build on results about the spectral gap 
from the literature. As in the electrical network approach, we need to ensure that these graphs are 
"strongly enough" connected. This will be achieved by requiring that the minimal vertex degree 
in the graph is "large enough" (with respect to the number n of vertices). Our results hold for any 
arbitrary degree distribution, as soon as the minimal degree grows slowly with n. 
The advantage of the spectral approach is that it is very general. It works for any kind of graph, 
and as opposed to the electrical network approach can also be used to treat the hitting times di- 
rectly. The technical disadvantage is that we cannot "throw away" irrelevant parts of the graph as 
in the electrical network approach (because no part of the graph is irrelevant to the gap), leading 
to slightly worse bounds. 



2.3 General limitations 

There are two major limitations to our results: 

• Our approximation results only hold if the graph is "reasonably strongly" connected and 
does not have too large a bottleneck. This ensures that the overall behavior of the com- 
mute distance cannot be dominated by a single edge. We can see this in both approaches. 
In the electrical network approach, the argument that "electricity can nearly flow without 
resistance" on the "many paths" breaks down if there is a strong bottleneck between u and 
V which all electricity has to pass. In the spectral approach, a strong bottleneck leads to a 
small spectral gap, and then the bounds become meaningless as well. 

• Our results only hold if the minimal degree in the graph is "reasonably large" , compared to 
the number n of vertices. For example, in the random graph models the minimal degree has 
to grow slowly with n, say as logn. This is to ensure that there are no single vertices that 
can have extremely high influence on the commute distance. 

The downside of this condition is that our results do not hold for power law graphs in which 
the smallest degree is constant. 

As presented in this intuitive section, it nearly sounds as if our results were obvious. Indeed, in 
hindsight they seem to be obvious, and this is part of why we like our results so much: they were 
very surprising when we found them, but can be made plausible to a wide range of people. We 
would like to stress that all these results were not known before our work, and that the "intuitive 
explanations" have to be seen as the succession of our technical work. In particular, the technical 
work presented in the rest of this paper makes explicit all the sloppy terms like "reasonably 
connected" and "large enough" . 

3 General setup, definitions and notation 

We consider undirected graphs G = (V,i?) that are connected and not bipartite. By n we denote 
the number of vertices. The adjacency matrix is denoted by W := (u;y)ij=i. In case the 
graph is weighted, this matrix is also called the weight matrix. All weights are assumed to be 
non-negative. The minimal and maximal weights in the graph are denoted by Wmin and Wmax- 
By di := "^ij denote the degree of vertex Vi. The diagonal matrix D with diagonal en- 

tries di, . . . ,dn is called the degree matrix, the minimal and maximal degrees are denoted dmin 
and (i,„ax- The unnormalized graph Laplacian is given as L :~ D — W, the normalized one as 
-^sym = D^^/'^LD~^/'^ . Consider the natural random walk on G. Its transition matrix is given 
as P = D^^W . It is well-known that A is an eigenvalue of Lsym if and only if 1 — A is an eigen- 
value of P. By 1 = Ai > A2 > . . . > A„ > — 1 we denote the eigenvalues of P. The quantity 
1 — max{A2, |A„|} is called the spectral gap of P. 

The hitting time H^v is defined as the expected time it takes a random walk starting in vertex u to 
travel to vertex v (where Huu = by definition). The commute distance {commute time) between 
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u and V is defined as Cuv ■— H^v + H^u- Recall that for a symmetric, non-invcrtiblc matrix A 
its Moore-Penrose inverse is defined as :— [A + U)~^ — U where U is the projection on the 
eigenspace corresponding to eigenvalue 0. It is well known that commute times can be expressed 
in terms of the Moore-Penrose inverse of the unnormalized graph Laplacian (e.g., Klein and 
Randic[ |l993| |Xiao and Gutman[ [20031 |Fouss al.[|2006[ ): 



Ri 



0), 



where is the i-th unit vector in R". The following representations for commute and hitting times 
involving the pseudo-inverse ily^ of the normalized graph Laplacian are less well known. 

Proposition 1 (Closed form expression for hitting and commute times) Let G he a con- 
nected, undirected graph with n vertices. The hitting times Hij, i ^ j , can he computed by 



Hi 



vol(G) 



1 



and the commute times satisfy 

-vol(G)(^ 



Closely related to the commute distance is the resistance distance. Here one interprets the graph 
as an electrical network where the edges represent resistors. The conductance of a resistor is given 
by the corresponding edge weight. The resistance distance R^v between two vertices u and v is 
defined as the effective resistance between u and v in the network. It is well known that the 
resistance distance coincides with the commute distance up to a constant: Cuv — vol(G)i?„„. For 



background reading on resistance and commute distances see Doyle and Snell ( 1984 ) , Klein and 



Randic (1993), Xiao and Gutman (2003), Fouss et al. (2006) 



Our main focus in this paper is the class of geometric graphs. For a deterministic (fixed) geo- 
metric graph we consider a fixed set of points Xi, . . . , Xn S R''. These points form the vertices 
wi, . . . ,w„ of the graph. The edges in the graph are defined such that "neighboring points" are 
connected. We consider the most popular types of random geometric graphs. In the e-graph we 
connect two points whenever their Euclidean distance is less than or equal to e. In the undirected, 
symmetric k-nearest neighbor graph we connect Vi to Vj if Xi is among the k nearest neighbors of 
Xj or vice versa. In the mutual k-nearest neighbor graph we connect Vi to Vj if Xi is among the k 
nearest neighbors of Xj and vice versa. Note that by default, the terms e- and kNN-graph refer 
to unweighted graphs in our paper. When we treat weighted graphs, we always make it explicit. 
For a general similarity graph we build a weight matrix between all points based on a similarity 
function k : R'^ x R"^ ^ R>q, that is we define the weight matrix W with entries Wij — k{Xi,Xj) 
and consider the fully connected graph with weight matrix W. The most popular weight function 



in applications is the Gaussian similarity function Wij — exp(- 
bandwidth parameter. 



\Xi - X.j\f/h^), where /i > is a 



While these definitions make sense with any fixed set of vertices, we are most interested in the 
case of random geometric graphs. Here we assume that the underlying set of vertices Xi,...,X„ 
has been drawn i.i.d. according to some probability density p on R'^. Once the vertices are known, 
the edges in the graphs are constructed as described above. In the random setting it is convenient 
to make regularity assumptions in order to be able to control quantities such as the minimal and 
maximal degrees. Sometimes we need to make these assumptions about the whole underlying space, 
sometimes just for a selected subset of it. Thus we introduce the following general definition. 

Definition 2 (Valid region) Let p be any density on R'^. We call a connected subset X C R'^ a 
valid region if the following properties are satisfied: 

1. The density on X is bounded away from 0, that is for all x ^ X we have that p[x) > Pmin > 
for some constant 

Pmin ■ 

2. X has "bottleneck" larger than some value h > 0.' the set {x € X : dist{x,dX) > h/2\ is 
connected (here dX denotes the topological boundary of X). 
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3. The boundary of X is regular in the following sense. We assume that there exist positive 
constants a > and Eq > such that if s < Eq, then for all points x S dX we have 
Yo\{B^[x) n X) > a vol(i?e(a;)) (where vol denotes the Lebesgue volume). Essentially this 
condition just excludes the situation where the boundary has arbitrarily thin spikes. 

Sometimes we consider a valid region with respect to two points s, t. Here we additionally assume 
that s and t are interior points of X . 

In the spectral part of our paper, we always have to make a couple of assumptions that will 
be summarized by the term general assumptions. They are as follows: First we assume that 
X := supp(p) is a valid region according to Definition[2j Second, we assume that X does not contain 
any holes and does not become arbitrarily narrow: there exists a homeomorphism /i : A" — > [0, l]'^ 
and constants < Lmin < ^max < oo such that for all x,y d X we have 

Lniin\\x-y\\ < \\h{x) - h{y)\\ < £max||a; - y|| . 

This condition restricts X to be topologically equivalent to the cube. In applications this is not 
a strong assumption, as the occurrence of "holes" with vanishing probability density is unrealistic 
due to the presence of noise in the data generating process. More generally we believe that our 
results can be generalized to other homeomorphism classes, but refrain from doing so as it would 
substantially increase the amount of technicalities. 

In the following we denote the volume of the unit ball in R'^ by rjd- For readability reasons, we 
are going to state our main results using constants q > 0. These constants are independent of n 
and the graph connectivity parameter {e or k or h, respectively) but depend on the dimension, the 
geometry of X , and p. The values of all constants are determined explicitly in the proofs. They 
are not the same in different propositions. 

4 Main results 

Our paper comprises two different approaches. In the first approach we analyze the resistance 
distance by flow based arguments. This technique is somewhat restrictive in the sense that it only 
works for the resistance distance itself (not the hitting times) and we only apply it to random 
geometric graphs. The advantage is that in this setting we obtain good convergence conditions 
and rates. The second approach is based on spectral arguments and is more general. It works for 
various kinds of graphs and can treat hitting times as well. This comes at the price of slightly 
stronger assumptions and worse convergence rates. 



4.1 Results based on flow arguments 

Theorem 3 (Commute distance on e-graphs) Let X be a valid region with bottleneck h and 
minimal density Pmin ■ For e < h, consider an unweighted e-graph built from the sequence Xi , . . . , X„ 
that has been drawn i.i.d. from the density p. Fix i and j . Assume that Xi and Xj have distance 
at least h from the boundary of X, and that the distance between Xi and Xj is at least 8s. Then 
there exist constants Ci, . . . ,ct > ( depending on the dimension and geometry of X) such that with 
probability at least 1 — cinexp(— C2»T.e'^) — exp{—C4ne'^) / the commute distance on the e-graph 
satisfies 

c^/ne'^ ifd>3 
C6 • log(l/£)/ne'^ if d ~ 3 

cr/ne^ if d = 2 

The probability converges to 1 if n — >■ oo and ne'^ j log(n) — >■ oo. The right hand side of the deviation 
bound converges to as n ^ oo, if 

ne''- —i'oo if d> 3 

< ne^ / \og{l / e) ^ oo ifd^3 
ne^ — ne''^^ — > oo if d — 2. 



ne 



ne 
di 



ne 
~d~ 



< < 
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Under these conditions, if the density p is continuous and i/e — > 0, then 



ne'^ ^ 1 1 



vol(G) Vdp{X,) Vdp{Xj) 

Theorem 4 (Commute distance on kNN-graphs) Let X be a valid region with bottleneck h 
and density bounds Pmin md Pmax • Consider an unweighted kNN-graph ( either symmetric or mu- 
tual) such that (k/n)^/'^ /2pyaax < h, built from the sequence Xi, . . . , X„ that has been drawn i.i.d. 
from the density p. 

Fix i and j . Assume that Xi and Xj have distance at least h from the boundary of X , and that the 
distance between Xi and Xj is at least 4(fc/n)"'^/'^/p,„ax- Then there exist constants ci, . . . ,05 > 
such that with probability at least 1 — cinexp(— C2fc) the commute distance on both the symmetric 
and the mutual kNN-graph satisfies 



k „ f k k 

:Ci 



vol(G) \d, dj 



{c^jk if d > 3 

C5 • log(n/fc)/fc tfd = 3 

cen^/yk^/^ ifd = 2 

The probability converges to 1 if n 00 and fc/log(n) — > 00. In case d> 3, the right hand side of 
the deviation bound converges to if k ^ 00 ( and under slightly worse conditions in cases d = 3 
and d = 2). Under these conditions, if the density p is continuous and if additionally k/n 0, 
then yoi^g-) Cij — > 2 almost surely. 

Let us make a couple of technical remarks about these theorems. 

To achieve the convergence of the commute distance we have to rescale it appropriately (for ex- 
ample, in the e-graph we scale by a factor of ne'^). Our rescaling is exactly chosen such that the 
limit expressions are finite, positive values. Scaling by any other factor in terms of n, e or k either 
leads to divergence or to convergence to zero. 

In case d > 3, all convergence conditions on n and e (or k, respectively) are the ones to be expected 
for random geometric graphs. They are satisfied as soon as the degrees grow faster than log(n) 
(for degrees of order smaller than log(n), the graphs are not connected anyway, see e.g. Penrose[ 



19991. Hence, our results hold for sparse as well as for dense connected random geometric graphs. 
In dimensions 3 and 2, our rates are not of the same flavor as in the higher dimensions. For 
example, in dimension 2 we need ne^ — ?> 00 instead of ne^ — > 00. On the one hand we are not too 
surprised to get systematic differences between the lowest few dimensions. The same happens in 
many situations, just consider the example of Polya's theorem about the recurrence/ transience of 
random walks on grids. On the other hand, these differences might as well be an artifact of our 
proof methods (and we suspect so at least for the case d — 3; but even though we tried, we did 
not get rid of the log factor in this case). It is a matter of future work to clarify this. 

The valid region X has been introduced for technical reasons. We need to operate in such a region 
in order to be able to control the behavior of the graph, e.g. the minimal and maximal degrees. 
The assumptions on X are the standard assumptions used regularly in the random geometric graph 
literature. In our setting, we have the freedom of choosing A" C R"* as we want. In order to obtain 
the tightest bounds one should aim for a valid X that has a wide bottleneck h and a high minimal 
density Pmin- In general this freedom of choosing X shows that if two points are in the same 
high-density region of the space, the convergence of the commute distance is very fast, while it 
gets slower if the two points are in different regions of high density separated by a bottleneck. 

We stated the theorems above for a fixed pair i,j. However, they also hold uniformly over all pairs 
i,j that satisfy the conditions in the theorem (with exactly the same statement). The reason is 
that the main probabilistic quantities that enter the proofs are bound on the minimal and maximal 
degrees, which of course hold uniformly. 

4.2 Results based on spectral arguments 

The representation of the hitting and commute times in terms of the Moore-Penrose inverse of the 
normalized graph Laplacian (Proposition [l]) can be used to derive the following key proposition 
that is the basis for all further results in this section. 
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Proposition 5 (Absolute and relative bounds in any fixed graph) Let G be a finite, con- 
nected, undirected, possibly weighted graph that is not bipartite. 



1. For i j 



vol(G) 



-Hii — 



< 2 



1-A, 



+ 1 



"min 



(1) 



2. For i j 
1 



vol(G)^'^ 



< 



w„ 



< 2 



1 - A, 



dl: 



(2) 



We would like to point out that even though the bound in Part 3 of the proposition is reminiscent 
to statements in the literature, it is much tighter. Consider the following formula from Lovasz 



(19931 



< 



vol(G)^'-'' 



< 



1 - A, 



that can easily be rearranged to the following bound: 



1 C ^1 
vol(G) 



1 



< 



1 - A, d. 



2 "inin 



(3) 



The major difference between our bound ^ and Lovasz' bound ^ is that while the latter has the 
term dmin in the denominator, our bound has the term d^j^ in the denominator. This makes all of 
a difference: in the graphs under considerations our bound converges to whereas Lovasz' bound 
diverges. 



4.2.1 Application to unweighted random geometric graphs 

In the following we are going to apply Proposition [5] to various random geometric graphs. Next 
to some standard results about the degrees and number of edges in random geometric graphs, the 
main ingredients are the following bounds on the spectral gap in random geometric graphs. These 
bounds are of independent interest because the spectral gap governs many important properties 
and processes on graphs. 



Theorem 6 (Spectral gap of the e-graph) Suppose that the general assumptions hold. Then 
there exist constants ci,...,C6 > such that with probability at least 1 — cinexp{—C2ne'^) — 
C3 exp(— C4ne'^) / e'^ 

I-X2 > C5 • and 1 - |A„| > cg-e'^^^/n. 

If ne'^ / logn ^ 00, then this probability converges to 1. 

Theorem 7 (Spectral gap of the kNN-graph) Suppose that the general assumptions hold. Then 
for both the symmetric and the mutual kNN-graph there exist constants Ci , . . . , C4 > such that 
with probability at least 1 — Cin exp(— C2A:), 

I-A2 > C3 • (/c/n)2/'^ and 1 - |A„| > C4 • fc^^V"^'^^^^^'* 

Ifk/\ogn — > 00, then the probability converges to 1. 

At first glance it seems surprising that the geometry of the underlying space X does not affect 
the order of magnitude of the spectral gap, these quantities only enter the bound in terms of the 
constants (as can be seen in the proofs below). In particular, for large n the spectral gap does 
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not depend on whether X has a "bottleneck" or not. Intuitively this is the case because if the 
sample size is large, even a bottleneck with very small diameter contains many sample points and 
"appears wide" to the random walk. 



The following theorems characterize the hitting and commute times for e-and kNN-graphs. They 
are direct consequences of plugging the results about the spectral gap into Proposition [5] 

Corollary 8 (Hitting and commute times on e-graphs) Assume that the general assump- 
tions hold. Consider an unweighted e-graph built from the sequence Xi, . . . ,X„ drawn i.i.d. from 
the density p. Then there exist constants Ci,...,C5 > such that with probability at least 1 — 
cin exp(— C2ne'^) — C3 exp{—C4ne'^) /e"^ , we have uniformly for all i ^ j that 



ne 



dy 



< 



C5 



ne' 



d+2 ■ 



vo^G) 

// the density p is continuous and n — )■ 00, e — )■ and ne'^+^ — > 00, then 



(4) 



rr 
VOl(G) 



Vd-piXj) 



almost surely. 



For the commute times, the analogous results hold due to Cij — Hi^ 



H 



Corollary 9 (Hitting and commute times on kNN-graphs) Assume that the general as- 
sumptions hold. Consider an unweighted kNN-graph built from the sequence Xi , . . . , Xn drawn 
i.i.d. from the density p. Then for both the symmetric and mutual kNN-graph there exist constants 
Ci,C2,C3 > such that with probability at least \ — Ci ■ n ■ exp(— fcc2), we have uniformly for all 
i ^ j that 



vol(G)^^^ 



k 

dj 



< C3 



(5) 



// the density p is continuous and n 00, k/n — >■ and k{k/n^ 

k 



2/d 



00, then 



vol(G)^^^ 



1 almost surely. 



For the commute times, the analogous results hold due to Cij — Hi^ 



H 



4.2.2 Application to weighted graphs 

In several applications, e-graphs or kNN graphs are not used as unweighted graphs, but addition- 
ally endowed with edge weights. For example, in the field of machine learning it is common to use 
Gaussian weights Wij = exp(— — where /i > is a bandwidth parameter. 

We can use standard spectral results to prove approximation theorems in such cases. 



Theorem 10 (Results on fully connected weighted graphs) Consider a fixed, fully con- 
nected weighted graph with weight matrix W . Assume that its entries are upper and lower bounded 
by some constants Wmin, Wmaxj that is < Wmin < Wij < Wmg^x for all i,j. Then, uniformly for all 
i,j e {1, ...,n}, i ^ j. 



vol(G)^^^' 



n 
di 



< An 



Wn 



dL 



< 4 



1 



For example, this result can be applied directly to a Gaussian similarity graph (for fixed bandwidth 

h). 



The next theorem treats the case of Gaussian similarity graphs with adapted bandwidth h. The 
technique we use to prove this theorem is very general. Using the Rayleigh principle, we reduce 
the case of the fully connected Gaussian graph to a truncated graph where edges beyond a certain 
length are removed. Bounds for this truncated graph, in turn, can be reduced to bounds of the 
unweighted e-graph. With this technique it is possible to treat very general classes of graphs. 
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Theorem 11 (Results on Gaussian graphs with adapted bandwidth) Let X C R'^ be a 

compact set and p a continuous, strictly positive density on X . Consider a fully connected, weighted 
similarity graph built from the points Xi, . . . , Xn drawn i.i.d. from Pr with density p. As weight 

function use the Gaussian similarity function kh{x,y) — g- exp ^— ^^^2h^^^ ) ■ V density p 

is continuous and n — >■ oo, h —?' and nh'^^'^/ log(n) oo, then 



——C 
vol(G) 



1 



1 



almost surely. 



Note that in this theorem, we introduced the scahng factor l/Zi'' aheady in the definition of the 
Gaussian similarity function to obtain the correct density estimate p{Xj) in the hmit. For this 
reason, the resistance resuhs are rescaled with factor n instead of nft,''. 



4.2.3 Apphcation to random graphs with given expected degrees and Erdos-Renyi 
graphs 

Consider the general random graph model where the edge between vertices i and j is chosen 
independently with a certain probability pij that is allowed to depend on i and j. This model 
contains very popular random graph models such as the Erdos-Renyi random graph, planted 
partition graphs, and random graphs with given expected degrees. For this class of random graphs, 



the following result has been proved recently by Chung and Radcliffe (2011 1 



Theorem 12 (Chung and Radchffe, 2011 ) Let G be a random graph where edges between ver- 



sym? 



tices i and j are put independently with probabilities pij . Consider the normalized Laplacian L 
and define the expected normalized Laplacian as the matrix Lsym := / — D^^/'^AD^^/'^ where 
Aij = E{Aij) = Pij and D = E(D). Let dmin be the minimal expected degree. Denote the eigen- 
values of Lsy,-n by /i, the ones of Lgy^ by JI. Ghoose e > 0. Then there exists a constant k = k{e) 
such that if drain > k\og{n), then with probability at least 1 — e, 



Vj = l,...,n: \^ij~fij\<2A 



/31og(4n/£) 



dm\r 



Application to Erdos-Renyi graphs. Here all edges have constant probabilities Pij = p (for 
simplicity, we also allow for self-edges). 

Corollary 13 (Results on Erdos-Renyi graphs) Let n — > oo, p — a;(logn/n). Then the 
rescaled hitting times on the Erdos-Renyi graph converge to a constant: for all vertices u, v in 
the graph we have 



in probability. 







— ■ Huv — 1 




n 


\npj 



Application to planted partition graphs. 

Next we consider a simple model of an Erdos-Renyi-graph with planted partitions, the planted 
bisection case. Assume that the n vertices are split into two "clusters" of equal size. We put an 
edge between two vertices u and v with probability PwUhin if they are in the same cluster and with 
probability Pbetween < Pwithin if they are in different clusters. For simplicity we allow self-loops. 

Corollary 14 (Random graph with planted partitions) Consider an Erdos-Renyi graph with 
planted bisection. Assume that pwithin — ^{\og(n)/n) and pbetween such that npbetween ^ oo (ar- 
bitrarily slow). Then, for all vertices u,v in the graph 



1 



H,j - 1 



= O 



^ Pbetween 



in probability. 
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This result is a prime example to show how that even though there is a strong cluster structure 
in the graph, the hitting times and commute distances cannot see this cluster structure any more, 
once the graph gets too large. Note that the corollary even holds ifpbetween grows much slower than 
Pwithin- That is, the larger our graph, the more pronounced is the cluster structure. Nevertheless, 
the commute distance converges to a trivial result. On the other hand, we also see that the speed 
of convergence is 0(".pbotwoen), that is, if Pbetwcen = g{'n)/n with a very slow growing function g, 
then convergence can be very slow. We might need very large graphs before the degeneracy of the 
commute time will be visible. 



Application to random graphs with given expected degrees. For a graph of n vertices we have n 
parameters > 0. For each pair of vertices Vi and Wj, we independently place an edge 

between these two vertices with probability djdj / Y]2-i dk- It is e asy to see that in this model, 
vertex Vi has expected degree di (cf. Section 5.3. in Chung and Lu 2006 for background reading). 



Corollary 15 (Results on random graphs v^rith expected degrees) Consider any sequence 
of random graphs with expected degrees such that dmin — w(logn). Then the commute distances 
satisfy for all i ^ j , 



vol(G) 

X 

di 



= o 



1 

log(2n) 



0, almost surely. 



5 Proofs for the flow-based approach 

For notational convenience, in this section we work with the resistance distance _R„„ = Cuv/ vol(G') 
instead of the commute distance Cuv, then we do not have to carry the factor 1/ vol(G) everywhere. 



5.1 Lower bound 

It is easy to prove that the resistance distance between two points is lower bounded by the sum of 
the inverse degrees. 

Proposition 16 (Lower bound) Let C he a weighted, undirected, connected graph and consider 
two vertices s and t, s ^ t. Assume that G remains connected if we remove s and t. Then the 
effective resistance between s and t is bounded by 

n . 

Rst > 



1 + WstQs 



where Qst = — Wst) + ^/{dt — Wst). Note that if s and t are not connected by a direct edge 

(that is, Wst — (^)j then the right hand side simplifies to 1/ds + l/dt- 

Proof. The proof is based on Rayleigh's monotonicity principle that states that increasing edge 
weights in the graph can never increase the effective resistance between two vertices (cf. Corollary 



7 in Section IX. 2 of Bollobas 1998). Given our original graph G, we build a new graph C by 
setting the weight of all edges to infinity, except the edges that are adjacent to s or t (setting the 
weight of an edge to infinity means that this edge has no resistance any more). This can also be 
interpreted as taking all vertices except s and t and merging them to one super-node a. Now our 
graph G' consists of three vertices s,a,t with several parallel edges from s to a, several parallel 
edges from a to t, and potentially the original edge between s and t (if it existed in G). Exploiting 
the laws in electrical networks (resistances add along edges in series, conductances add along edges 



in parallel; see Section 2.3 in Lyons and Peres (2010|) for detailed instructions and examples) leads 



to the desired result. © 
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5.2 Upper bound 



This is the part that requires the hard work. Our proof is based on a theorem that shows how the 
resistance between two points in the graph can be computed in terms of flows on the graph. The 



following result is taken from Corollary 6 in Section IX. 2 of Bollobas (1998). 



Bollobas , 1998) Let G = {V, E) be a weighted 



Theorem 17 (Resistance in terms of flows, cf. 

graph with edge weights We (e £ E). The effective resistance R^t between two fixed vertices s and 
t can be expressed as 



inf ■ 



E 

.eeE 



We. 



u = {ue)eeE unit flow from s to t 



(6) 



Note that evaluating the formula in the above theorem for any fixed flow leads to an upper bound 
on the effective resistance. The key to obtaining a tight bound is to distribute the flow as widely 
and uniformly over the graph as possible. 

For the case of geometric graphs we are going to use a grid on the underlying space to construct 
an efficient flow between two vertices. Let Xi, ...,Xn be a fixed set of points in R'^ and consider a 
geometric graph G with vertices Xi, Fix any two of them, say s := Xi and t := X2. Let 

A" C R'' be a connected set that contains both s and t. Consider a regular grid with grid width g 
on X. We say that grid cells are neighbors of each other if they touch each other in at least one 
edge. 

Definition 18 (Valid grid) We call the grid valid if the following properties are satisfied: 

1. The grid width is not too small: Each cell of the grid contains at least one of the points 

Xl , . . . , Xn ■ 

2. The grid width g is not too large: Points in the same or neighboring cells of the grid are 
always connected in the graph G. 

3. Relation between grid width and geometry of X : Define the bottleneck h of the region X as 
the largest u such that the set {x ^ X \ dist(x,dX) > u/2} is connected. 

We require that \/d g < h (a cube of side length g should fit in the bottleneck). 

We now prove the following general proposition that gives an upper bound on the resistance 
distance between vertices in a fixed geometric graph. 

Proposition 19 (Resistance on a fixed geometric graph) Consider a fixed set of points 
Xi,...,Xn in some connected region X C R"^ and a geometric graph on Xi, Xn. Assume that 
X has bottleneck not smaller than h ( where the bottleneck is defined as in the definition of a valid 
grid). Denote s — Xi and t — X2. Assume that s and t can be connected by a straight line that 
stays inside X and has distance at least h/2 to dX. Denote the distance between s andt by d(s,t). 
Let g be the width of a valid grid on X and assume that d{s,t) > A^fd g. By iVmin denote the 
minimal number of points in each grid cell, and define a as 



/i/(2.g\/d- 1) 



(7) 



Assume that points that are connected in the graph are at most Q grid cells apart from each other 



(for example, two points in the two grey cells in Figure 3b are 5 cells apart from each other). Then 



the effective resistance between s and t can be bounded as follows: 



In case c? > 3 : Rgt < — I — r 
In case c? = 3 : Rgt < 



1 1 

ds dt 

1 1 

ds dt 



In 



d = 2: R,t< 



1 1 

ds dt 



1 1 

ds dt 

1 1 

ds dt 



1 1 



2 


1 




mill 


2 


1 




mm 


2 


1 

1 — = — 



dt y iV„in iVi 



d{s,t) 
<?(2a+l)3 



log(a) + 2 + 



2Q 

d{s,t) 



2a + 2 



d{s,t) 
5(2a + l) 



(8) 



.g(2a + l)2 ' 



(9) 

+ 2q] (10) 
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(a) Step 1. Distributing 
the flow from s (black 
dot) to all its neighbors 
(grey dots). 



(b) Step 2. We bring 
back all flow from p to 
C{s). Also shown in the 
figure is the hypercube 
to which the flow will be 
expanded in Step 3. 



(c) Steps 3 and 4 of the flow construction: dis- 
tribute the flow from C{s) to a "hypercube" 
H{s), then transmit it to a similar hypercube 
H{t) and guide it to C{t). 



Figure 3: The flow construction — overview. 



The general idea of the proof is to construct a flow from s to i with the help of the underlying 
grid. On a high level, the construction of the proof is not so difficult, but the details are lengthy 
and a bit tedious. The rest of this section is devoted to it. 



Construction of the flow — overview. Without loss of generality we assume that there exists 
a straight line connecting s and t which is along the first dimension of the space. 

Step 0: We start a unit flow in vertex s. 

Step 1: We make a step to all neighbors Neigh(s) of s and distribute the flow uniformly over all 
edges. That is, we traverse dg edges and send flow 1/ds over each edge (see Figure 3a). 

Step 2: Some of the flow now sits inside C(s), but some of it might sit outside of C(s). In this step, 
we bring back all flow to C(s) in order to control it later on (see Figure [3b| . 



Step 3: We now distribute the flow from C(s) to a larger region, namely to a hypercube H(s) of 
side length h that is perpendicular to the linear path from s to i and centered at C(s) (see 
the hypercubes in Figure 3c). This can be achieved in several substeps that will be defined 
below. 

Step 4: We now traverse from H{s) to an analogous hypercube H{t) located at t using parallel paths, 
see Figure [Sc] 

Step 5: From the hypercube H{t) we send the flow to the neighborhood Neigh(<) (this is the "reverse" 
of steps 2 and 3). 

Step 6: From Neigh(t) we finally send the fiow to the destination t ("reverse" of step 1). 



Details of the flow construction and computation of the resistance beween s and t in 
the general case d > 3. We now describe the individual steps and their contribution to the 
bound on the resistance. We start with the general case d > 3. We will discuss the special cases 
d = 2 and d = 3 below. 



In the computations below, by the "contribution of a step" 
rem 



we mean the part of the sum in Theo- 



17 that goes over the edges considered in the current step. 



Step 1 We start with a unit flow at s that we send over all dg adjacent edges. This leads to flow 



1/ds over ds edges. According to the formula in Theorem 17 this contributes 
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C(s) 




Layer 1 



Layer 2 



(a) Definition of layers. 



(b) Before Step 3a 
starts, all flow is uni- 
formly distributed in 
Layer i — 1 (dark area) . 



(c) Step 3a then dis- 
tributes the flow from 
Layer i — 1 to the adja- 
cent cells in Layer i 















i 









(d) After Step 3a: aU 
flow is in Layer i, but 
not yet uniformly dis- 
tributed 



(e) Step 3b redis- 
tributes the flow in 
Layer i. 




(f) After Step 3b, the 
flow is uniformly dis- 
tributed in Layer i. 



Figure 4: Details of Step 3 between Layers i — I and i. The first row corresponds to the expansion 
phase, the second row to the redistribution phase. The figure is shown for the case of d = 3. 



, 1 1 



to the overall resistance Rg 



Step 2: After Step 1, the flow sits on all neighbors of s, and these neighbors are not necessarily 
all contained in C(s). To proceed we want to re-concentrate all flow in C(s). For each neighbor p 



of s, we thus carry the flow along a Hamming path of cells from p back to C(s), see Figure 3b for 
an illustration. 

To compute an upper bound for Step 2 we exploit that each neighbor p of s has to traverse at most 
Q cells to reach C(s) (recall the definition of Q from the proposition). Let us fix p. After Step 1, 
we have flow of size 1 /ds in p. We now move this flow from p to all points in the neighboring cell 



C(2) (cf. Figure 3b). For this we can use at least N^nin edges. Thus we send flow of size l/d^ over 
A'min edges, that is each edge receives flow l/{dsNynin)- Summing the flow from C{p) to C(2), for 
all points p, gives 



d N ■ 



mm 



1 \^ 1 



da-^min J daNm]Tn 



Then we transport the flow from C(2) along to C(s). Between each two cells on the way we can 
use -/V^jjj edges. Note, however, that we need to take into account that some of these edges might 
be used several times (for different points p). In the worst case, C(2) is the same for all points p, 
in which case we send the whole unit flow over these edges. This amounts to flow of size 1/{N^^^J 
over {Q — 1)N'^^^ edges, that is a contribution of 

Q-1 



min 
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Altogether we obtain 



^ 1 Q 

r2 < + ^ 



d N ■ . 

"s-'^min -"^min 



Step 3: At the beginning of this step, the complete unit flow resides in the cube C(s). We now 
want to distribute this flow to a "hypercube" of three dimensions (no matter what d is, as long 
as d > 3) that is perpendicular to the line that connects s and t (see Figure 3c where the case of 
d = 3 and a 2-dimensional "hypercube" are shown) . To distribute the flow to this cube we divide 
it into layers (see Figure 4a I. Layer consists of the cell C(s) itself, the first layer consists of all 
cells adjacent to C(s), and so on. Each side of Layer i consists of 

k - (2i + l) 

cells. For the 3-dimensional cube, the number Zi of grid cells in Layer i, « > 1, is given as 



6 -(2^-1)2 + 



12 • {2i - 1) 



interior cells of the faces cells along the edges (excluding corners) 

All in all we consider 



8 



corner cells 



h/{2gVd) < h/{2{g-l)Vd) 



layers, so that the final layer has diameter just a bit smaller than the bottleneck h. We now 
distribute the flow stepwise through all layers, starting with unit flow in Layer 0. To send the flow 
from Layer i — 1 to Layer i we use two phases, see Figure |4] for details. In the "expansion phase" 
3a(i) we transmit the flow from Layer i — 1 to all adjacent cells in Layer i. In the "redistribution 
phase" 3b(i) we then redistribute the flow in Layer i to achieve that it is uniformly distributed in 
Layer i. In all phases, the aim is to use as many edges as possible. 

Expansion phase 3a(i). We can lower bound the number of edges between Layer i — 1 and Layer i 
by Zj_i7V^ijj: each of the Zi-i cells in Layer i — 1 is adjacent to at least one of the cells in Layer j, 
and each cell contains at least N-aim points. Consequently, we can upper bound the contribution 
of the edges in the expansion phase 3a(i) to the resistance by 



r^iait) < Zj-l^min ' ; ^ 



mm - 



Redistribution phase 3b(i). We make a crude upper bound for the redistribution phase. In this 
phase we have to move some part of the flow from each cell to its neighboring cells. For simplicity 
we bound this by assuming that for each cell, we had to move all its flow to neighboring cells. By 
a similar argument as for Step 3a(i), the contribution of the redistribution step can be bounded by 

\-^«^^min/ -^«^^min 

All of Step 3. All in all we have a layers. Thus the overall contribution of Step 3 to the resistance 
can be bounded by 

a 2^1 2/l°~M\ 3 

r-3 = E'-3aW+r3K.) < ^ ]^ l+^Ei2 ^ (11) 

j^l ^^min ^^min \ ' / ^*min 

To see the last inequality, note that the sum '^'^Z^ 1 / is a partial sum of the over-harmonic series 
that converges to a constant smaller than 2. 
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Step 4: Now we transfer all flow in "parallel cell paths" from H{s) to H{t). We have (2a + 1)'^ 
parallel rows of cells going from H{s) to H{t), each of them contains d{s,t)/g cells. Thus all in all 
we traverse (2a + 1)^ N^^-^^d{s , t)/g edges, and each edge carries flow l/((2a + l^N"^;^^). Thus step 
4 contributes 



Ti < (2a + ifN^ 



g \{2a + irNl,J g{2a + lfN^ 



mm 



Step 5 is completely analogous to steps 2 and 3, with the analogous contribution = ^^^^ — ^fa- 
Step 6 is completely analogous to step 1 with overall contribution of rg — l/dt- 

Summing up the general case d > 3. All these contributions leads to the following overall 

bound on the resistance in case d > 3: 



with a and Q as defined in Proposition |19| This is the result stated in the proposition for case c? > 3. 

Note that as spelled out above, the proof works whenever the dimension of the space satisfies 
d > 3. In particular, note that even if d is large, we only use a 3-dimensional "hypercube" in Step 
3. It is sufficient to give the rate we need, and carrying out the construction for higher-dimensional 
hypercube (in particular Step 3b) is a pain that we wanted to avoid. 

The special case d = 3. In this case, everything works very similar to above, except that we we 
only use a 2-dimensional "hypercube" (this is what we always show in the figures). The only place 
in the proof where this really makes a difference is in Step 3. The number Zi of grid cells in Layer 
i is given as Zi = 8i. Consequently, instead of obtaining an over-harmonic sum in we obtain a 
harmonic sum. Using the well-known fact that X^iLi 1/* — logl'*) + 1 we obtain 



-3<^(l + ^E^) < ^(2 + log(a)) 

-'^min \ ° .i^i ' / -"^min 

In Step 4 we just have to replace the terms (2a + 1)^ by (2a -I- 1)^. This leads to the result in 
Proposition [19] 

The special case d = 2. Here our "hypercube" only consists of a "pillar" of 2a -|- 1 cells. The 
fundamental difference to higher dimensions is that in Step 3, the flow does not have so much 
"space" to be distributed. Essentially, we have to distribute all unit flow through a "pillar", which 
results in contributions 



^3< .,2 



r4 < 



2a 

^2 
n 

d{s,t) 



mm 



g (2a + l)iV, 



2 

mill 



This concludes the proof of Proposition 19 



© 



Let us make a couple of technical remarks about this proof. For the ease of presentation we sim- 
plified the proof in a couple of respects. 
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Strictly speaking, we do not need to distribute the whole unit flow to the outmost Layer a. The 
reason is that in each layer, a fraction of the flow already "branches off" in direction of t. We 
simply ignore this leaving flow when bounding the flow in Step 3, our construction leads to an 
upper bound. It is not difficult to take the outbound flow into account, but it does not change 
the order of magnitude of the final result. So for the ease of presentation we drop this additional 
complication and stick to our rough upper bound. 

When we consider Steps 2 and 3 together, it turns out that we might have introduced some loops 
in the flow. To construct a proper flow, we can simply remove these loops. This would then just 
reduce the contribution of Steps 2 and 3, so that our current estimate is an overestimation of the 
whole resistance. 

The proof as it is spelled out above considers the case where s and t are connected by a straight 
line. It can be generalized to the case where they are connected by a piecewise linear path. This 
does not change the result by more than constants, but adds some technicality at the corners of 
the paths. 

The construction of the flow only works if the bottleneck of X is not smaller than the diameter of 
one grid cell, if s and t are at least a couple of grid cells apart from each other, and if s and t are 
not too close to the boundary of X. We took care of these conditions in Part 3 of the deflnition of 
a valid grid. 



5.3 Proof of the Theorems [3] and [H 



First of all, note that by Rayleigh's principle (cf. Corollary 7 in Section IX. 2 of Bollobas 1998 ) 
the effective resistance between vertices cannot decrease if we delete edges from the graph. Given 
a sample from the underlying density p, a random geometric graph based on this sample, and 
some valid region X, we first delete all points that are not in X. Then we consider the remaining 
geometric graph. The effective resistances on this graph are upper bounds on the resistances of 
the original graph. Then we conclude the proofs with the following arguments: 

Proof of Theorem [3} The lower bound on the deviation follows immediately from Proposition 
[161 The upper bound is a consequence of Proposition [19] and well known properties of random 
geometric graphs (summarized in the appendix). In particular, note that we can choose the grid 
width g :— ej {2\/d — 1) to obtain a valid grid. The quantity A^min can be bounded as stated in 



Proposition 29 and is of order ne , the degrees behave as described in Proposition 30 and are also 
of order ne'^we use 5 = 1/2 in these results for simplicity). The quantity a in Proposition 19 is 
of the order 1/e, and Q can be bounded by Q = s/g and by the choice of g is indeed a constant. 
Plugging all these results together leads to the final statement of the theorem. © 

Proof of Theorem |4} This proof is analogous to the e-graph. As grid width g we choose 
g — -Rfc,min/(2v'c? — 1) where i?fc,inin is the minimal /c-nearest neighbor distance (note that this 
works for both the symmetric and the mutual kNN-graph). Exploiting Propositions 29 and 31 we 
can see that Rk,min and Rk,ma^ are of order (fc/n)^/'', the degrees and iVmin are of order k, a is of 
the order {n/k)'^/'^ and Q a constant. Now the statements of the theorem follow from Proposition 

m © 

6 Proofs for the spectral approach 
6.1 Proof of the key propositions [l] and [5] 

In this section we prove the general formulas to compute and approximate the hitting times. 

Proof of Proposition [l] For the hitting time formula, let ui, . . . , u„ be an orthonormal set of 
eigenvectors of Lsym corresponding to the eigenvalues Ai, . . . , A„. Let denote the j-th entry of 
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Ui. According to Lovasz (1993) the hitting time is given by 



Hij = vol 



A straightforward calculation using the spectral representation of isym yields 

\ ^ fc=2 



The result for the commute time follows from the one for the hitting times 



© 



In order to prove Proposition [5] we first state a small lemma. For convenience, we set A — 
D^^/^W D^^/^ and Ui = Ci/^/di. Furthermore, we are going to denote the projection on the 
eigenspace of the j-the eigenvalue of A by Pj . 

Lemma 20 (Pseudo-inverse ijym) The pseudo-inverse of the symmetric Laplacian satisfies 

Llym = I-Pl+M 

where I denotes the identity matrix and M is given as follows: 

oo n , 

M = Y.^A~P,f = H^^Pr (12) 

fe=l r=2 

Furthermore, for all u,v € R" we have 

\{u,Mv)\ < ^1^^.\\[A-P,)u\\ ■ \\{A~P,)v\\ + \{^, {A~P,)v)\ (13) 

Proof. The projection onto the null space of Lgym is given by Pi = y/dy/d /X]i=i where 
Vd = {y/di, . . . , y/dn)^ ■ As the graph is not bipartite, A„ > —1. Thus the pseudoinverse of Lgym 
can be computed as 

oo 

Ll^^ {I ~ A)^ = [I - A + Pi)-^ - Pi = Y,{A - Pif - Pi. 



fe=0 



Thus 



M ■.= Y^{A- Pit = Y^{A- Pi)^{A^ Pi) 

k=l k=0 
oo n n n n 



fc=0 r=2 



St 



Xr 



-Pr 



which proves Equation ( |12[ ). By a little detour, we can also see 
M 



= Y,{A-Pi)\A-Pif + {A-Pi) = (£^^P,){A-Pif + {A-Pi). 



k=0 
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Exploiting that {A — Pi) commutes with all Pr gives 

n 

{u, Mv) = {{A - Pi)u , —^P^){A - Pi)v) + {u, {A - Pi)v) 



r=2 



Applying the Cauchy-Schwarz inequality and the fact || X]r=2 i\ Pr\\2 = 1/(1 — ^2) leads to the 
desired statement. 

© 



Proof of Proposition [sj This proposition now follows easily from the Lemma above. Observe 
that 



{ui,Auj) 



< 



Wmax 1 , Wn 



fc=l 



I|a(m,-?/,)||2< 



d\x\\r\ V fi^?' 



1 1 \ ^ 2Wmax 



1^ ■ 
mill 



Exploiting that Pi{ui 
1 



vol(G)^'^' 



we get for the hitting time 

= I {uj,M{uj -Ui)) I 
1 



1 



< — II^WjII • \\A{uj ~ Ui)\\ + \{u.j,A{uj - Ui))\ 

1 




For the commute time, we note that 



vol(G)^'^' 



< 



{ui - Uj,M{ui - Uj)) 



—\\A{tH - Uj)\\^ + I (mi - Uj,A{u^ ~ Uj)) 

i — At 



< 



1 

1- A2 



(14) 
(15) 
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We would like to point out that the key to achieving this bound is not to give in to the temptation 



to manipulate Eq. (12) directly, but to bound Eq. (13). The reason is that we can compute terms 



of the form {ui,Auj) and related terms explicitly, whereas we do not have any explicit formulas 



for the eigenvalues and eigenvectors in ( 12 ) 



6.2 The spectral gap in random geometric graphs 

As we have seen above, a key ingredient in the approximation result for hitting times and commute 
distances is the spectral gap. In this section we show how the spectral gap can be lower bounded 
for random geometric graphs. We first consider the case of a fixed geometric graph. From this 
general result we then derive the results for the special cases of the e-graph and the kNN-graphs. 
All graphs considered in this section are unweighted and undirected. We follow the strategy in 



Boyd et al. (2005) where the spectral gap is bounded by means of the Poincare inequality (see 
Diaconis and Stroock ( 1991 1 for a general introduction to this technique; see Cooper and Frieze 



(20091 for a related approach in simpler settings). The outline of this technique is as follows: for 
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Figure 5: Canonical path between a and b. We first consider a "Hamming path of ceUs" between 
a and b. In all intermediate cells, we randomly pick a point. 



each pair {X, Y) of vertices in the graph we need to select a path ^xy in the graph that connects 
these two vertices. In our case, this selection is made in a random manner. Then we need to 
consider all edges in the graph and investigate how many of the paths ^xy, on average, traverse 
this edge. We need to control the maximum of this "load" over all edges. The higher this load is, 
the more pronounced is the bottleneck in the graph, and the smaller the spectral gap is. Formally, 
the spectral gap is related to the maximum average load b as follows. 



Proposition 21 (Spectral gap, Diaconis and Stroock 1991) Consider a finite, connected, 
undirected, unweighted graph that is not bipartite. For each pair of vertices X^Y let Pxy be a prob- 
ability distribution over all paths that connect X andY and have uneven length. Let {'^xy)x,y be a 
family of paths independently drawn from the respective Pxy- Define b := max^g edge} ^lilXY \ e G 
7xy}|- Denote by |7max| the maximum path length (where the length of the path is the number of 
edges in the path). Then the spectral gap in the graph is bounded as follows: 



1- A2 > 



vo^G) 

'^niaxlTmaxIfc 



1- |A„| > 



^max I Tinax | ^ 



(16) 



For deterministic sets F, this proposition has been derived as Corollary 1 and 2 in 'Diaconis and 



Stroock| ( |1991[ ). The adaptation for random selection of paths is straightforward, see |Boyd et al. 
(|2005|). 



The key to tight bounds based on Proposition [21] is a clever choice of the paths. We need to make 
sure that we distribute the paths as "uniformly" as possible over the whole graph. This is relatively 
easy to achieve in the special situation where A" is a torus with uniform distribution (as studied 
in iBoyd et al. 2005 Cooper and Frieze 2009, ) because of symmetry arguments and the absence 



of boundary effects. However, in our setting with general X and p we have to invest quite some 
work. 



6.2.1 Fixed geometric graph on the unit cube in R'^ 

We first treat the special case of a fixed geometric graph with vertices in the unit cube [0, l]"* in 
R'^. Consider a grid on the cube with grid width g. For now we assume that the grid cells are so 
small that points in neighboring cells are always connected in the geometric graph, and so large 
that each cell contains a minimal number of data points. We will specify the exact value of g later. 
In the following, cells of the grid are identified with their center points. 

Construction of the paths. Assume we want to construct a path between two vertices o and b 
that correspond to the points a — (ai, . . . , ad), b = (61, . . . , 6^) G [0, 1]''. Let C(a) and C{b) denote 
the grid cells containing a and 6, denote the centers of these cells by c(a) = (c(a)i, . . . , c(a)d) and 
c{b) — (c(6)i, . . . ,c{b)d). We first construct a deterministic "cell path" between the cells C(a) and 
C{b) (see Figure [Sj This path simply follows a Hamming path: starting at cell C(a) we change the 
first coordinate until we have reached c(6)i. For example, if c(a)i < c{b)i we traverse the cells 

(c(a)i,c(a)2, . . . ,c(a)d) ^ (c(a)i + g, 0(0)2, c(a)<j) ^ . . . (c(6)i, 0(0)2, c(a)d) . 

Then we move along the second coordinate from 0(0)2 until we have reached 0(6)2, that is we 
traverse the cells (o(5)i, *, 0(0)3, c(a)d)- And so on. This gives a deterministic way of traversing 
adjacent cells from C(o) to C{b). Now we transform this deterministic "cell path" to a random 
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path on the graph. In the special cases where a and b are in the same cell or in neighboring cells, 
we directly connect a and b by an edge. In the general case, we select one data point uniformly at 
random in each of the interior cells on the cell path. Then we connect the selected points to form 
a path. Note that we can always force the paths to have uneven lengths by adding one more point 
somewhere in between. 

Proposition 22 (Path construction is valid) Assume that (1) Each cell of the grid contains 
at least one data point. (2) Data points in the same and in neighboring cells are always connected 
in the graph. Then the graph is connected, and the paths constructed above are paths in the graph. 

Proof. Obvious, by construction of the paths. © 



In order to apply Proposition [21] we now need to compute the maximal average load of all paths. 

Proposition 23 (Maximum average load for fixed graph on cube) Consider a geometric 
graph on [0,1]'' and the grid of width g on [0,1]''. Denote by N^i^ and iVmax the minimal and 
maximal number of points per grid cell. Construct a random set of paths as described above. 

1. Let C be any fixed cell in the grid. Then there exist at most d/g'^'^^ pairs of cells {A, B) such 
that cell paths starting in cell A and ending in cell B pass through C . 

2. If the path construction is valid, then the maximal average load is upper bounded by 

< 1 + 



/V2 N ■ o'^+i ' 

Proof. Part 1. We identify cells with their centers. Consider two different grid cells A and B 
with centers a and b. By construction, the Hamming path between A and B has the corners 

a =(01,02,03,. . . ,0^) ^ (61,02,03, ...,ad)^ (61,62,03, . . . ,arf) 

. . . ^ (61,62,63, • ■ .,bd^i,ad) ^ (61,62,63, . . .,bd-i,bd) = 6. 

All cells on the path have the form (61, 62, ... , 6;_i, *, oj+i, . . . , Od) where * can take any value 
between o/ and 6/. A path can only pass through the fixed cell with center c if there exists some 
/ € { 1 , . . . , d} such that 

(ci, . .. ,Cd) = (61, 62, . . . , 6i_i, *, a;+i, . . . , ad). 

That is, there exists some Z G {1, . . . ,d} such that 

(/) bi ~ Ci for all i = 1, 1 and (//) ai — ci for all i = Z + 1, . . . , d. 

For the given grid size g there are Xjg different cell centers per dimension. For fixed / there thus 
exist 1/(7''^'+^ cell centers that satisfy (/) and l/g' cell centers that satisfy (//). So all in all there 
are Xjg'^^^ pairs of cells A and B such that both (/) and (//) are satisfied for a fixed value of I. 
Adding up the possibilities for all choices of Z G {1, . . . , d} leads to the factor d. 

Part 2. Fix an edge e in the graph and consider its two adjacent vertices V\ and v^- If V\ and 
Vi are in two different cells that are not neighbors to each other, then by construction none of the 
paths traverses the edge. If they are in the same cell, by construction at most one of the paths can 
traverse this edge, namely the one directly connecting the two points. The interesting case is the 
one where v\ and vi lie in two neighboring grid cells C and C. 

If both cells are "interior" cells of the path, then by construction each edge connecting the two 
cells has equal probability of being selected. As there are at least N.cam points in each cell, there 
are at least N'^^^ different edges between these cells. Thus each of the edges between the cells is 
selected with probability at most l/iV^jj^. We know by Part 1 that there are at most dj g'^^^ pairs 
of start/end cells. As each cell contains at most A^max points, this leads to N'^^^d j g'^^^ different 
paths passing through C. This is also an upper bound on the number of paths passing through 
both C and C. Thus, each edge is selected by at most dN"^^^/ {g''-^^ N"^^^) paths. 
If at least one of the cells is the start cell of the path, then the corresponding vertex, say vi, is the 
start point of the path. If V2 is an intermediate point, then it is selected with probability at most 
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1/Mnin (the case where V2 is an end point has aheady been treated at the beginning). Similarly 
to the last case, there are at most N^^i^^d / g'^'^^ paths that start in vi and pass through C. This 
leads to an average load of dNj^ax/ {g'^'^^ Nn^n) on edge e. The same holds with the roles of vi and 
V2 exchanged, leading to a factor 2. 

The overall average load is now the sum of the average loads in the different cases. © 



6.2.2 Fixed geometric graph on a domain X that is homeomorphic to a cube 

Now assume that A" C R'' is a compact subset that is homeomorphic to the cube [0, 1]'' in the 
following sense: we assume that there exists a homeomorphism h : X ^ [0, 1]*^ and constants 
< Lniin < -/jmax < oo such that for all x, y € A" we have 

Aninlla; - 2/11 < \\h{x)-h{y)\\ < imaxlk - y|i . (17) 

The general idea is now as follows. Assume we are given a geometric graph on Xi, . . . , X„ EX. In 
order to construct the paths we first map the points in the cube using h. Then we construct the 
paths on h{Xi), . . . , h{Xn) G [0, l]'^ as in the last section. Finally, we map the path back to X. 

Proposition 24 (Maximum average load for fixed graph on general domain) Let G be 

a geometric graph based on Xi, . . . , X„ G X . Assume that there exists some g > such that points 
of distance smaller than g are always connected in the graph. Consider a mapping h : X [0, l]"* 
as in Equation (17) and a grid of width g on [0, 1]'^. Let {Ci)i be the cells of the g-grid on [0, 1]'', de- 
note their centers by Ci. Let Bi and be balls in X with radius r — g/{2Li^s,x) and R — Vd g/L^^^ 
centered at h~^(ci). 

1. These balls satisfy Bi C h~^{Ci) C B[. 

2. Denote by N-arm the minimal number of points in Bi and A'^ax the maximal number of 
points in B[. Construct paths between the points h{Xi) g [0, 1]"* as described in the previous 
subsection. If N^^^i^ > 1 and g < L^i^^g / \/d + 3, then these paths are valid. 

3. In this case, the maximal average load can be upper bounded by 

Proof. Part 1. Let Ci be the center of cell Ci and consider the ball Bi centered at h^^{ci) 
with radius g/ {2Lyaiix}- Clearly, h~^{ci) is an interior point of h~^{Ci). Suppose that there exists 
X E BiC\ dh~^{Ci). Since h maps the boundary of h~^{Ci) onto the boundary of Ci, we conclude 
that h{x) G dCi and thus ||/i(a;) — Ci\\ > g/2. By our assumption on the homeomorphism we can 
estimate 

\\x-h-'ic.)\\ > ^J\h{x)^c.\\ > 
Hence, Bi C h^^{Ci). To show the other statement let x,y E h^^{Ci). Then 

< j^J\h{x) ~ h{y)\\ < ^diamC, = 

Part 2. By the definition of A^min it is clear that each cell of the grid contains at least one point. 
Consider two points Xi, Xj G X such that h{Xi) and h(Xj) are in neighboring cells of the 5-grid. 
Then \\h{X,) - h{Xj)\\ < g^d + 3. By the properties of h, 

\\h-\Xi)-h-\X,)\\<-^\\X,-X,\\<-^Vd + ig < g. 

Thus, by the definition of g the points Xi and Xj are connected in G. 

Part 3. Follows directly from Proposition [23) © 



23 



6.2.3 Spectral gap for the e-graph 



Now we are going to apply Proposition 24 to e-graphs. We will use the general results on e-graphs 
summarized in the appendix. 

Proposition 25 (Maximal average load for e-graph) Assume that X is homeomorphic to the 
cube with a mapping h as described in Equation (17). Then there exist constants Ci,C2,C3 > such 
that with probability at least 1 — ci ex'p{—C2ne'^)/e^, the maximum average load is upper bounded 
by c^/e'^^^. If ne'^ / \ogn oo, then this probability tends to 1 as n oo. 



Proof. The proof is based on Proposition 24 By construction we know that points with distance 
at most g = e are always connected in the e-graph. By Part 2 of Proposition [24j to ensure that 
points in neighboring grid cells are always connected in the gra ph w e thus need to choose the grid 
width g — e ■ Lmin/ Vd + 3. The radius r defined in Proposition 24 is then given as 

9 imin 



2Lr, 



The probability mass of the balls Bi is thus bounded by 



^min > r'^ridPrninCt = ■ 



in 



Vd 



2d{d + 3)d/2 



We have 



K = 1// = VdT3/Li,^ ■ il/e") =: k ■ {l/e") 



grid cells and thus the same number of balls Bi. We can now apply Proposition 29 (with S 1/2) 



to deduce the bound for the quantity A^min used in Proposition 24 
p(Nnun < ne 



:'^c,nin/2) < ^exp(-n£'^cn,i„/12). 

Analogously, for 7V,„ax we have R = s > eVd/ W + S and 6max = R''^VdPma,x = s'^mPma,^ 
With 5 = 0.5 we then obtain 

PfiV,„ax > < 4 exp(-77e'^Cn,ax/12). 



Plugging these values into Proposition [24| leads to the result. 
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We are now ready to prove Theorem |6] by applying Proposition 21 With probability at least 
1 — cin exp(— C2 »£'^) , both the minimal and maximal degrees in the graph are of the order Q{ne'^) 
{ci. Proposition 30), and the volume of G is of order ld{n^e'^). To compute the maximal number 
l7max| of edges in each of the paths constructed above, observe that each path can traverse at most 
d-l/g = {dy/d + 3/Linin) • cubes, and a path contains just one edge per cube. Thus |7max| is of 
the order Q{l/e). In Proposition 25 we have seen that with probability at least C4exp{—C5ne'^)/e'^ 



the maximum average load b is of the order ri(l/e''+^) 
1 leads to the result. 



Plugging all these quantities in Proposition 
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6.2.4 Spectral gap for the kNN-graph 

As in the case of the flow proofs, the techniques in the case of the kNN-graphs are identical to the 
ones for the e-graph, we just have to replace the deterministic radius e by the minimal kNN-radius. 
As before we exploit that if two sample points have distance less than -Rfc^min from each other, then 
they are always connected both in the symmetric and mutual kNN-graph. 

Proposition 26 (Maximal average load in the kNN-graph) Under the general assumptions, 
with probability at least 1 — ci ■ n ■ exp(— C2fc) the maximal average load in both the symmetric and 
mutual kNN-graph is bounded from above by c^{n/kY^^/'^ . If k/\ogn — )■ oo, then this probability 
converges to 1. 
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Proof. This proof is completely parallel to the one of Proposition [25j the role of e is now taken 
over by Rk^min- © 

Finally, the proof of Theorem [t] goes as follows. With probabilities at least 1 — nexp(— cifc) the 
following statements hold: the minimal and maximal degree are of order Q{k), thus the number of 
edges in the graph is of order Q{nk). Analogously to the proof for the e-graph, the maximal path 
length |7max| is of the order l/i?fe,min = Q{{k/ny^'^). The maximal average load is of the order 
0{{n/ky^^^'^). Plugging all these quantities in Proposition 
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leads to the result. © 



6.3 Proofs of Corollaries [8] and |9] 

Now we collected all ingredients to finally present the following proofs. 
Proof of Corollary [s] 

This is a direct consequence of the results on the minimal degree (Proposition 30 ) and the spectral 
gap (Theorem [6]) . Plugging these results into Proposition [S] leads to the first result. The last 
statement in the theorem follows by a standard density estimation argument, as the degree of a 



vertex in the e-graph is a consistent density estimator (see Proposition 30 1. © 



Proof of Corollary [9] 

Follows similarly as Theorem [8] by applying Proposition [Sj The results on the minimal degree and 
the spectral gap can be found in Proposition [31] and Theorem [7] The last statement follows from 
the convergence of the degrees, see Proposition [31] © 



6.4 Weighted graphs 

For weighted graphs, we use the following results from the literature. 

Proposition 27 (Spectral gap in weighted graphs) 1. For any row- stochastic matrix P, 



A2 < ^ max 

^ id 



El !zl 

fe=l 



Wjk I 



< 1 



nmm 



< 1- 



2. Consider a weighted graph G with edge weights < Wmin < Wij < Wmax o^nd denote its second 
eigenvalue by ^2,weighted- Consider the corresponding unweighted graph where all edge weights 
are replaced by 1, and denote its second eigenvalue by X2.unweighted- Then we have 



(I-A2 



ighted) 



Wn 



5: (1 ~ ^2,weighted) ^ (1 ~ ^2,unweighted) 



W„ 



Proof. 



1. This bound was obtained by Zenger (1972), see also Section 2.5 of Seneta (2006) for a 



discussion. Note that the second inequality is far from being tight. But in our application, 
both bounds lead to similar results. 

2. This statement follows directly from the well-known representation of the second eigenvalue 
fi2 of the normalized graph Laplacian Lgym (see Sec. 1.2 in Chung 1997), 



M2 



inf 



Note that the eigenvalue /i2 of the normalized Laplacian and the eigenvalue A2 of the random 
walk matrix P are in relation 1 — A2 = /i2- 
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We will now show to examples how this proposition can be used. The first application of Propo- 
sition [27] is the Proof of Theorem |10[ which follows directly from plugging in the first part of 
Proposition 27 in Theorem [5] 
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The second application of Proposition [27] is the following proof. 

Proof of Theorem IllL 

We split 



1 



< 



nRi 



n 
di 



n 



1 



d, p{X,) p{X,) 



Under the given assumption, the second term on the right hand side converges to a.s. by a 
standard kernel density estimation argument. The main work is the first term on the right hand 
side. We treat upper and lower bounds of Rij — 1 /di — i/dj separately. 

To get a lower bound, recall that by Proposition [16] we have 



1 + WijQij 



where Qij ~ l/{di — Wij) + l/{dj — Wij) and Wij is the weight of the edge between i and j. It is 
straightforward to see that under the given conditions. 



n i?. 



1 

d, 



> n 



1 



1 

di 



a.s. 



To treat the upper bound, we define the e-truncated Gauss graph as the graph with edge 
weights 







ii\\X,~Xj\\ <e, 
else. 



Let df = J2^=i '^ij- Because of wf^ < Wij and Rayleigh's principle, we have i?y < i?fj , where i?*^ 
denotes the resistance of the e-truncated Gauss graph. Obviously, 



nRi 



n n , 

^ + ^ < 



< 



n 
d5 



n n 
di di 



(*) 



(**) 



To bound term we show that the degrees in the truncated graph converge to the ones in the 
non-truncated graph. To see this, note that 



E 



X, 



1 1 
(27r)i 
1 



(27r)2 7i3(o,|) 



-liXi-sir 
e ^ p{y) dy 

e 2 p[Xi + hz) dz 



X, 



1 



e 2 p{Xi + hz) dz. 



Exploiting that 



{2tt) 2 JlRd\B(0,f) 



e — 5- < 



1 



(2^)t 
< 25e ^ =2^ 



1 



log(ne'^+2) 4 

we obtain the convergence of the expectations: under the assumptions on n and h from the theorem. 



0. 
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Now, a probabilistic bound for term can be obtained by standard concentration arguments. 

We now bound term (*). In the following we implicitly define e via — e^/ log(rte''+^). Note that 
for the given choice of e, the truncated Gaussian graph "converges" to the non-truncated graph, 
as we truncate less and less weight. 

Denote by weighted eigenvalues of the e-truncated Gauss graph, and by w^^^^, w;max its 
minimal and maximal edge weights. Also consider the graph G" that is the unweighted version of 
the e-truncated Gauss graph . Note that G" coincides with the standard e-graph. We denote 
its eigenvalues by ""weighted ^ gy applying Proposition [s] and Corollary [s] we get 



nRl 



< 



< 



\ £, weighted 
^2 



"min 



1 



1- A 



e, unweighted 



2 Z + Z 



(19) 



(20) 



where the first inequality holds with probability at least 1 — cin exp{—C2nh'^) — C3 exp(— 04/1/1'^) / h'^ 



By (**) we already know that the last factor of Term (201 converges to a constant: 



1/piX,) + 1/piX,) 



For the other factors of Term (20) we use the following quantities: 

_2 1 1 

J? („e<i+2)i/2 



w, 



< 



1 



1 - A, > £2 



^min ^ ^min 



Plugging these quantities in ( 20 ) we obtain the convergence of ( 
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Proof of Corollary 13 

Proof. It is well known that under the given assumptions, the following properties hold with high 
probability: the graph is connected and the minimal and average degrees are of the order np, in 
particular np/dj converges to 1 in probability. The volume of the graph is of the order ri^p. To 



use Theorem 



12 



observe that the matrix A 



pj where J is the {n x n)-matrix of all ones. The 

i_ 

np 



expected degree of all vertices is np. Hence, D''^^'^AD^^^^ — ^ ■ A. This matrix has rank 1, 
its non-zero eigenvalue is 1 with the constant one vector as corresponding eigenvector. Hence the 
expected spectral gap in this model is 1. It is easy to see that as soon as p/log{n) — > 00, the 
deviations in Theorem [T2| converge to 0. Plugging all this into our Proposition [5] shows that with 
high probability. 



np 



1 rr _1 

vol(G) d. 



< np ■ 2 



I-A2 



w„ 



O 
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Proof of Corollary 14 



Proof. The expected degree of each vertex is n(pwithin +J'between)/2, the expected volume of the 
graph is ^.^(pwithin +Pbetwocn)/2. The matrix A has the form (^gjpj^ where J is the {n/2 x n/2)- 
matrix of all ones. The expected degree of all vertices is n{p + q)/2. Hence, D~^/^AD~^/^ = 
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n(p_^q) ' ^- This matrix has rank 2, its largest eigenvalue is 1 (with eigenvector the constant 1 
vector), the other eigenvalue is [p — q)/{p + q) with eigenvector (1, .... 1, — 1, — 1). Hence, the 
spectral gap in this model is 2q/ {p + q). 

Under the assumption that p — Lu{\og{n)/n), the deviations in Theorem 12 converge to 0. Plugging 
the expected spectral gap in our bound in Proposition [5] shows that with high probability, 



^(Pwithin Pbctwccn ) 

2 



vol(G)^*^' 



< 



^Pbctwccn ^(j^within ^^ Pbctwccn) 



= o 



\ ^Pbctwccn 
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Proof of Corollary 15 

Proof, we use the result from Theorem 4 in Chung and RadclifFe (2011) which states that under 
the assumption that the minimal expected degree dmin satisfies dniin/log(n) — > oo, then with 
probability at least 1 — 1/n the spectral gap is bounded by a term of the order 0(log(2n)/c?inin)- 
Plugging this in Proposition [5] shows that with high probability. 



1 ^ i_ 

vol(G) di 



< 



I ^min 

\log{2n) 



O 



(log(2n) 
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7 Discussion 



We have presented different strategies to prove that in many large graphs the commute distance 
can be approximated by l/dj + 1/dj. Both our approaches tell a similar story. Our result holds 
as soon as there are "enough disjoint paths" between i and j, compared to the size of the graph, 
and the minimal degree is "large enough" compared to n. 

We would like to point out that our results on the degeneracy of the hitting and commute times 
are not due to pathologies such as a "misconstruction" of the graphs. For example, in the random 
geometric graph setting the graph Laplacian can be proved to converge to the Laplace-Beltrami 



operator on the underlying space under similar assumptions as the ones above ( Hein et al. 2007 1 



But even though the Laplacian itself converges to a meaningful limit, the resistance distance, which 
is computed based on point evaluations of the inverse of this Laplacian, does not converge to a 
useful limit. 



The limit distance function dist{i,j) = l/di-\-\/dj is completely meaningless as a distance function. 
It just considers the local density (the degree) at the two vertices, but does not take into account 
any global property such as the cluster structure of the graph. As the speed of convergence is very 
fast (for example, of the order 1/n in the case of Gaussian similarity graphs), the use of the raw 
commute distance should be discouraged even on moderate sized graphs. However, there might 
be ways how useful information can be extracted from the commute distance, namely in the form 
of the remainder terms Sij — 1 /di — 1/dj. Exploring this idea in depth is a project for future research. 



There are two important classes of graphs that are not covered in our approach. In power law 
graphs as well as in grid-like graphs, the minimal degree is constant, thus our results do not lead 
to tight enough bounds. The resistance distances on grid-like graphs has been studied in some 
particular cases. For example, Cserti (2000) and Wu (2004) prove explicit formulas for the resis- 
tance on regular one-and two-dimensional grids, and Benjamini and Rossignol (2008) characterize 
the variance of the resistance on random Bernoulli grids. To the best of our knowledge, general 
results about the convergence of the resistance distance on grid- like graphs do not exist. 
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8 Appendix: General properties of random geometric graphs 



In this appendix we collect some basic results on random geometric graphs. These results are 
well-known, but we did not find any reference where the material is presented in the way we need 
it (often the results are used implicitly or are tailored towards particular applications). 

In the following, assume that X :— supp(p) is a valid region according to Definition 1. Recall the 
definition of the boundary constant a in the valid region. 

A convenient tool for dealing with random geometric graphs is the following well-known concen- 



tration inequality for binomial random variables that has first appeared in Angluin and Valiant 



(19771. 



Proposition 28 (Concentration inequalities) Let N he a Bin{n,p)-distributed random vari- 
able. Then, for all d g]0, 1], 

P 



P(^N > {1 + S)np^ < expi-^^np). 



We will see below that computing expected, minimum and maximum degrees in random geometric 
graphs always boils down to counting the number of data points in certain balls in the space. The 
following proposition is a straightforward application of the concentration inequality above and 
serves as "template" for all later proofs. 

Proposition 29 (Counting sample points) Consider a sample Xi, . . . , Xn drawn i.i.d. accord- 
ing to density p on X . Let Bi, . . . , Bx be a fixed collection of subsets of X (the Bi do not need 
to be disjoint). Denote by 6min :— mini=i....^if p{x)dx the minimal probability mass of the sets 
Bi (.similarly by &max the maximal probability mass), and by A^min '"^'^ A'ljjax the minimal (resp. 
maximal) number of sample points in the sets Bi. Then for all S €]0, 1] 

-P(A^max > (1 + <5)nfe,nax) < K ■ expi^Shlb,r,^j3) 

P{N^in < (1 - ^)ri&mi„) < K ■ exp(-,52^6,„i„/3). 



Proof. This is a straightforward application of Proposition 28 using the union bound. © 



When working with e-graphs or kNN-graphs, we often need to know the degrees of the vertices. As 
a rule of thumb, the expected degree of a vertex in the e-graph is of the order Q{ne'^), the expected 
degree of a vertex in both the symmetric and mutual kNN-graph is of the order Q{k). The expected 
kNN-distance is of the order 8((fc/n)^/''). Provided the graph is "sufficiently connected" , all these 
rules of thumb also apply to the minimal and maximal values of these quantities. The following 
propositions make these rules of thumb explicit. 

Proposition 30 (Degrees in the e-graph) Consider an e-graph on a valid region X C R"^. 

1. Then, for all S g]0, 1] , the minimal and maximal degrees in the e-graph satisfy 

P{dn^^^ > (1 + ^)"^eVnax%) < ' CXp(-5^n£''p,„ax?7<i/3) 

P{dnun < (1 - S)ne'^p,ninVdOi^ < n ■ cxp(-(5^ne'*pmin?7da/3). 

In particular, if ne'^/logn — > oo, then these probabilities converge to as n ~^ oo. 

2. //n — >■ oo, e — >■ and ne'^/logn — >■ oo, and the density p is continuous, then for each interior 
point Xi G X the degree is a consistent density estimate: di/ {ne'^rjd) — >■ p{Xi) a.s. 

Proof. Part 1 follows by applying Proposition [29] to the balls of radius e centered at the data 
points. Note that for the bound on we need to take into account boundary effects as only 

a part of the e-ball around a boundary point is contained in X . This is where the constant a 
comes in (recall the definition of a from the definition of a valid region). Part ^ is a standard 
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density estimation argument: the expected degree of Xi is the expected number of points in the 
e-ball around Xi. For e small enough, the e-ball around Xi is completely contained in X and the 
density is approximately constant on this ball because we assumed the density to be continuous. 
The expected number of points is approximately ne'^r]dp{Xi) where rjd denotes the volume of a 
d-dimensional unit ball. The result now follows from Part 1. © 



Recall the definitions of the /c-nearest neighbor radii: Rk{x) denotes the distance of x to its k- 
nearest neighbor among the Xi, and the maximum and minimum values are denoted i?fe,max 
maxi=i^, ,,n Rk{Xi) and -Rfc.min '■— ^^^i=i,...,n RkiXi). Also recall the definition of the boundary 
constant a from the definition of a valid region. 

Proposition 31 (Degrees in the kNN-graph) Consider a valid region X C R''. 

1. Define the constants a = l/(2pinax??d)"'^^'^ o-nd a := 2/{pminVdOiY^''' ■ Then 

P(i?fc,min <af-j ) < nexp(-fc/3) 

P[Rk.m..>a(-] ) < nexp(-fc/12). 

If n oo and k/\ogn — > oo, then these probabilities converge to 0. 

2. Moreover, with probability at least 1 — nexp(— C4fc) the minimal and maximal degree in both 
the symmetric and mutual kNN-graph are of the order Q(k) (the constants differ). 

3. If the density is continuous, n — > oo, fc/logn — > oo and additionally k/n — > 0, then in both 
the symmetric and the mutual kNN-graph, the degree of any fixed vertex Vi in the interior of 
X satisfies k/di 1 a.s. 

Proof. Part 1. Define the constant a — l/(2pijiax%)^^'^ and the radius r :— a {k/nf'\ fix 
a sample point a;, and denote by the probability mass of the ball around x with radius r. 
Set /Umax := r'^VdPmax > ^SiX^^x t^'i^)- Notc that /J,max < 1- Obscrve that Rk{x) < r if and 
only if there are at least k data points in the ball of radius r around x. Let M ~ Bin(n, /i) and 
V ^ Bin{n, ^max)- Note that by the choices of a and r we have E{V) ~ fc/2. All this leads to 

p{Rk{x)<r^ < p(^M>k^ < P(v>k^ = P(v >2E{V)y 



Applying the concentration inequality of Proposition 28 (with S := 1)) and using a union bound 



leads to the following result for the minimal kNN-radius 



< n max [Rk{Xi) <r] 
< n exp(— fc/3). 



By a similar approach we can prove the analogous statement for the maximal kNN-radius. Note 
that for the bound on Rk,ma.x we additionally need to take into account boundary effects: at the 
boundary of X, only a part of the ball around a point is contained in X, which affects the value 
of fimin- We thus define d := 2/(p,„in?7da)^/'', r := a(fc/n)^/'*, /Xmin := r''??dPminQ; where a e]0,l] 
is the constant defined in the valid region. With V = Bin{n, ^min) with EV = 2k we continue 
similarly to above and get (using S = 1/2) 

P(i?fc,max > a f-j ) < nexp(-fc/12). 

Part 2. In the directed kNN-graph, the degree of each vertex is exactly fc. Thus, in the mutual 
kNN-graph, the maximum degree over all vertices is upper bounded by fc, in the symmetric kNN- 
graph the minimum degree over all vertices is lower bounded by fc. 



30 



For the symmetric graph, observe that the maximal degree in the graph is bounded by the maximal 
number of points in the balls of radius i?fc,inax centered at the data points. We know that with 
high probability, a ball of radius i?fc,max contains of the order 9(r7,i?5? ^^^) points. Using Part 1 we 
know that with high probability, Rk,ma.x is of the order (fc/n)^/'^. Thus the maximal degree in the 
symmetric kNN-graph is of the order 0(fc), with high probability. 

In the mutual graph, observe that the minimal degree in the graph is bounded by the minimal 
number of points in the balls of radius i?fc,min centered at the data points. Then the statement 
follows analogously to the last one. 

Part 3, proof sketch. Consider a fixed point x in the interior of X . We know that both in the 
symmetric and mutual kNN-graph, two points cannot be connected if their distance is larger than 
Rk,raa.x- As wc know that i?fc,max IS of the order (k/n)^^'^, under the growth conditions on n and 
k this radius becomes arbitrarily small. Thus, because of the continuity of the density, if n is 
large enough we can assume that the density in the ball B{x, i?fc,max) of radius Rk,max around x is 
approximately constant. Thus, all points y € B{x, Rk,max) have approximately the same expected 
fc-nearest neighbor radius R :— {k/{n ■ p(x)rid)Y^'^ ■ Moreover, by concentration arguments it is 
easy to see that the actual kNN-radii only deviate by a factor 1 ± 5 from their expected values. 
Then, with high probability, all points inside of B{x, i?(l — 5)) are among the k nearest neighbors 
of X, and all k nearest neighbors of x are inside B{x,R{l + 6)). On the other hand, with high 
probability x is among the k nearest neighbors of all points y £ B{x,R{l — 6)), and not among 
the k nearest neighbors of any point outside of B{x,R{l + S)). Hence, in the mutual kNN-graph, 
with high probability x is connected exactly to all points y e B{x,R{l — S)). In the symmetric 
kNN-graph, x might additionally be connected to the points in B{x,R{l + 6)) \ B{x,R{l — S)). 
By construction, with high probability the number of sample points in these balls is {I + 5)k and 
(1 — 5)k. Driving 5 to leads to the result. © 
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